由于缺乏异常样品,因此仅具有正常样本的先验知识的异常检测才吸引更多的注意力。现有的基于CNN的像素重建方法遇到了两个问题。首先,重建源和目标是包含无法区分的语义信息的原始像素值。其次,CNN倾向于很好地重建正常样品和异常情况,使它们仍然很难区分。在本文中,我们提出异常检测变压器(ADTR)将变压器应用于重建预训练的特征。预训练的功能包含可区分的语义信息。同样,采用变压器限制以很好地重构异常,因此一旦重建失败,就可以轻松检测到异常。此外,我们提出了新的损失函数,使我们的方法与正常样本的情况以及具有图像级和像素级标记为异常的异常情况兼容。通过添加简单的合成或外部无关异常,可以进一步提高性能。广泛的实验是在包括MVTEC-AD和CIFAR-10在内的异常检测数据集上进行的。与所有基线相比,我们的方法取得了卓越的性能。
translated by 谷歌翻译
尽管无监督的异常检测迅速发展,但现有的方法仍需要训练不同对象的单独模型。在这项工作中,我们介绍了完成具有统一框架的多个类别的异常检测。在如此具有挑战性的环境下,流行的重建网络可能属于“相同的快捷方式”,在这种捷径中,正常样本和异常样本都可以很好地恢复,因此无法发现异常值。为了解决这一障碍,我们取得了三个改进。首先,我们重新审视完全连接的层,卷积层以及注意力层的配方,并确认查询嵌入(即注意层内)在防止网络学习快捷键方面的重要作用。因此,我们提出了一个层的查询解码器,以帮助建模多级分布。其次,我们采用一个邻居掩盖的注意模块,以进一步避免从输入功能到重建的输出功能的信息泄漏。第三,我们提出了一种功能抖动策略,即使使用嘈杂的输入,也敦促模型恢复正确的消息。我们在MVTEC-AD和CIFAR-10数据集上评估了我们的算法,在该数据集中,我们通过足够大的利润率超过了最先进的替代方案。例如,当在MVTEC-AD中学习15个类别的统一模型时,我们在异常检测的任务(从88.1%到96.5%)和异常定位(从89.5%到96.8%)上超过了第二个竞争者。代码将公开可用。
translated by 谷歌翻译
这项工作研究了很少的对象计数的问题,该问题计算了查询图像中出现的示例对象的数量(即由一个或几个支持图像描述)。主要的挑战在于,目标对象可以密集地包装在查询图像中,从而使每个单一对象都很难识别。为了解决障碍,我们提出了一个新颖的学习块,配备了相似性比较模块和功能增强模块。具体来说,给定支持图像和查询图像,我们首先通过比较每个空间位置的投影特征来得出分数图。有关所有支持图像的得分图将共收集在一起,并在示例维度和空间维度上均标准化,从而产生可靠的相似性图。然后,我们通过使用开发的点相似性作为加权系数来增强使用支持功能的查询功能。这样的设计鼓励模型通过更多地关注类似于支持图像的区域来检查查询图像,从而导致不同对象之间的界限更加清晰。在各种基准和培训设置上进行了广泛的实验表明,我们通过足够大的边距超过了最先进的方法。例如,在最近的大规模FSC-147数据集中,我们通过将平均绝对误差从22.08提高到14.32(35%$ \ uparrow $)来超越最新方法。代码已在https://github.com/zhiyuanyou/safecount中发布。
translated by 谷歌翻译
Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks. Dozens of papers in the field of FR are published every year. Some of them were applied in the industrial community and played an important role in human life such as device unlock, mobile payment, and so on. This paper provides an introduction to face recognition, including its history, pipeline, algorithms based on conventional manually designed features or deep learning, mainstream training, evaluation datasets, and related applications. We have analyzed and compared state-of-the-art works as many as possible, and also carefully designed a set of experiments to find the effect of backbone size and data distribution. This survey is a material of the tutorial named The Practical Face Recognition Technology in the Industrial World in the FG2023.
translated by 谷歌翻译
A central challenge of building more powerful Graph Neural Networks (GNNs) is the oversmoothing phenomenon, where increasing the network depth leads to homogeneous node representations and thus worse classification performance. While previous works have only demonstrated that oversmoothing is inevitable when the number of graph convolutions tends to infinity, in this paper, we precisely characterize the mechanism behind the phenomenon via a non-asymptotic analysis. Specifically, we distinguish between two different effects when applying graph convolutions -- an undesirable mixing effect that homogenizes node representations in different classes, and a desirable denoising effect that homogenizes node representations in the same class. By quantifying these two effects on random graphs sampled from the Contextual Stochastic Block Model (CSBM), we show that oversmoothing happens once the mixing effect starts to dominate the denoising effect, and the number of layers required for this transition is $O(\log N/\log (\log N))$ for sufficiently dense graphs with $N$ nodes. We also extend our analysis to study the effects of Personalized PageRank (PPR) on oversmoothing. Our results suggest that while PPR mitigates oversmoothing at deeper layers, PPR-based architectures still achieve their best performance at a shallow depth and are outperformed by the graph convolution approach on certain graphs. Finally, we support our theoretical results with numerical experiments, which further suggest that the oversmoothing phenomenon observed in practice may be exacerbated by the difficulty of optimizing deep GNN models.
translated by 谷歌翻译
The Shapley value (SV) is adopted in various scenarios in machine learning (ML), including data valuation, agent valuation, and feature attribution, as it satisfies their fairness requirements. However, as exact SVs are infeasible to compute in practice, SV estimates are approximated instead. This approximation step raises an important question: do the SV estimates preserve the fairness guarantees of exact SVs? We observe that the fairness guarantees of exact SVs are too restrictive for SV estimates. Thus, we generalise Shapley fairness to probably approximate Shapley fairness and propose fidelity score, a metric to measure the variation of SV estimates, that determines how probable the fairness guarantees hold. Our last theoretical contribution is a novel greedy active estimation (GAE) algorithm that will maximise the lowest fidelity score and achieve a better fairness guarantee than the de facto Monte-Carlo estimation. We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets.
translated by 谷歌翻译
Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.
translated by 谷歌翻译
Recently, there has been significant progress in teaching language models to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks. CoT uses language models to perform both reasoning and computation in the multi-step `thought' process. To disentangle computation from reasoning, we propose `Program of Thoughts' (PoT), which uses language models (mainly Codex) to express the reasoning process as a program. The computation is relegated to an external computer, which executes the generated programs to derive the answer. We evaluate PoT on five math word problem datasets (GSM, AQuA, SVAMP, TabMWP, MultiArith) and three financial-QA datasets (FinQA, ConvFinQA, TATQA) for both few-shot and zero-shot setups. Under both few-shot and zero-shot settings, PoT can show an average performance gain over CoT by around 12\% across all the evaluated datasets. By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets. All of our data and code are released in Github\footnote{\url{https://github.com/wenhuchen/Program-of-Thoughts}}.
translated by 谷歌翻译
In recent years, aerial swarm technology has developed rapidly. In order to accomplish a fully autonomous aerial swarm, a key technology is decentralized and distributed collaborative SLAM (CSLAM) for aerial swarms, which estimates the relative pose and the consistent global trajectories. In this paper, we propose $D^2$SLAM: a decentralized and distributed ($D^2$) collaborative SLAM algorithm. This algorithm has high local accuracy and global consistency, and the distributed architecture allows it to scale up. $D^2$SLAM covers swarm state estimation in two scenarios: near-field state estimation for high real-time accuracy at close range and far-field state estimation for globally consistent trajectories estimation at the long-range between UAVs. Distributed optimization algorithms are adopted as the backend to achieve the $D^2$ goal. $D^2$SLAM is robust to transient loss of communication, network delays, and other factors. Thanks to the flexible architecture, $D^2$SLAM has the potential of applying in various scenarios.
translated by 谷歌翻译
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications. However, the enormous size of large-scale graphs hinders their applications under real-time inference scenarios. Although existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure, these methods still suffer from scalability issues when making inferences on unseen nodes, as the feature preprocessing requires the graph is known and fixed. To speed up the inference in the inductive setting, we propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information. This could successfully avoid the redundant computation of feature propagation. Moreover, the trade-off between accuracy and inference latency can be flexibly controlled by simple hyper-parameters to match different latency constraints of application scenarios. To compensate for the potential inference accuracy loss, we further propose Inception Distillation to exploit the multi scale reception information and improve the inference performance. Extensive experiments are conducted on four public datasets with different scales and characteristics, and the experimental results show that our proposed inference acceleration framework outperforms the SOTA graph inference acceleration baselines in terms of both accuracy and efficiency. In particular, the advantage of our proposed method is more significant on larger-scale datasets, and our framework achieves $75\times$ inference speedup on the largest Ogbn-products dataset.
translated by 谷歌翻译